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Abstract 

Finding heavy-elements (heavy-hitters) in streaming data is one of the central, and well-understood 
tasks. Despite the importance of this problem, when considering the sliding windows model of 
streaming (where elements eventually expire) the problem of finding L2-heavy elements has re- 
mained completely open despite multiple papers and considerable success in finding ii-heavy ele- 
ments. In this paper, we develop the first poly-logarithmic-memory algorithm for finding L2-heavy 
elements in sliding window model. (That is, finding elements that appear more times than a spec- 
ified threshold of the L 2 frequency norm.) This is especially significant since finding L2-heavy 
elements has far greater applicability in many practical scenarios and is considered a more central 
question. What is especially interesting regarding our solutions is that finding L 2 -heavy elements is 
not a "smooth" function, and thus previous methods of working with sliding windows via smooth 
histograms (that in turn generalizes exponential histograms) are not applicable. 

To obtain our results on L 2 heavy elements, we "merely" combine (but in a non-trivial and 
novel way) existing techniques from the literature. In fact, variants of techniques sufficient to 
derive similar results were known since 2002, yet, no algorithm for L2 heavy elements was reported. 
Since L2 heavy elements play a central role for many fundamental streaming problems (such as 
frequency moments), we believe our method would be extremely useful for many sliding- windows 
algorithms and applications. For example, our technique allows us not only to find L2-heavy 
elements, but also heavy elements with respect to any L p for < p < 2 on sliding windows. Thus, 
our paper completely resolves the question of finding L p -heavy elements for sliding windows with 
poly-logarithmic memory for all values of p since it is well known that for p > 2 this task is 
impossible. 

Our method may have other applications as well. We demonstrate a broader applicability of 
our novel yet simple method on two additional examples: we show how to obtain a sliding window 
approximation of other properties such as the similarity of two streams, or the fraction of elements 
that appear exactly a specified number of times within the window (the a-rarity problem) . In these 
two illustrative examples of our method, we replace the current expected memory bounds with worst 
case bounds. 
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1 Introduction 



A data stream S is an ordered multiset of elements {ao, a±, a-i . . .} where each element at £ {1, . . . , u} 
arrives at time t. In the sliding window model we consider at each time t > N the last N elements 
of the stream, i.e. the window W = {a t -fN-i)i ■ ■ ■ , a±}. Those elements are called active elements, 
whereas elements that arrived prior to the current window {aj | < i < t — (N — 1)} are expired. For 
t < N, the window consists of all the elements received so far, {ao, . . . , at}. 

Usually, both u and N are considered to be extremely large so it is not applicable to save the 
entire stream (or even one entire window) in memory. The problem is to be able to calculate various 
characteristics about the window's elements using a small amount of memory (usually, poly-logarithmic 
in N and u). We refer the reader to the books of Muthukrishnan [39] and Aggarwal (ed.) [JJ for 
extensive surveys on data stream models and algorithms. 

One of the main open problems in data streams deals with the relations between the different 
streaming models [36], specifically between the unbounded stream model and the sliding window 
model. In this paper we provide another important step in clarifying the connection between these 
two models by showing that finding L p -heavy hitters is just as doable on the sliding windows model 
as on the entire stream. 



Motivation. We focus on approximation-algorithms for certain statistical characteristics of the data 
streams, specifically, finding frequent elements. The problem of finding frequent elements in a stream is 
very interesting for many applications, such as network monitoring [32] and DoS prevention |23[ I18 [ H]. 
and was extensively explored over the last decade (see \39\ [17] for a definition of the problem and a 
survey of existing solutions, as well as [H] 135] 126] 13^ [19] 131 120] 133] 127]). 

Determining which element is considered heavy can be interpreted in several ways. The most 
common way is to define a fraction of the stream size, and say that every element that appears more 
times than this threshold is considered heavy. A more powerful notion is to define a heavy element 
as such that appears more times than a specified fraction of the second frequency norm of the stream 
(or window). Recall that the L p norm of the frequency vectoi0 is defined by L p = (^i n f) 1 ^ P ' wnere 
rii is the frequency of element i £ [u], i.e., the number of times i appears in the window. Thus, the 
L\ norm is fixed to the size of the stream (or window), while the L2 norm is affected by the specific 
frequencies of the elements in the stream. 

Finding frequent elements with respect to the L2 norm is a more difficult task than the equivalent 
L\ problem. To demonstrate this let us regard the following example: let S be a stream of size N, in 
which the element a± appears y/~N times, while the rest of the elements 02, • appear exactly once 
in S. It follows that n\ = ~^L\ while n\ = cL>2, where c is a constant, lower bounded by 4=. Any 
algorithm that finds all the elements which are heavier then 7L1 with memory poly (7 , log N, log u), 
can not identify a\ as an heavy element 

Generally speaking, identifying frequent elements (heavy-hitters) with respect to L p is better for 
larger values of p [30] . We focus on solving the following L2-heaviness problem: 

Definition 1 ((7, e)-approximation of Infrequent elements, |30[ 138]). For < e, 7 < 1, output any 
element i G [u] such that rii > 7L2 and no element such that < (1 — e)^L2- 

The L2 norm is the most powerful norm for which we can expect a poly-logarithmic solution, for 
the frequent-elements problem. This is due to the known lower bound of fi(ti 1_2 / p ) for calculating 



1 Throughout the paper we use the term "L p norm" to indicate the L p norm of the frequency vector, i.e., the pth 
root of the pth frequency moment F p = [2], rather than the norm of the data itself. 

In order for the algorithm to be poly-logarithmic, it must be the case that 7" 1 = 0(log c iV) for some constant c. 
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Lp over a stream |4H [6] , which results from observing the amount of memory required to identify an 
element which is a constant fraction of the L p norm. 

There has been a lot of progress on the question of finding Li-frequent elements, in the sliding 
window model [H l43| I27j . however those algorithms can not be used to find Infrequent elements 
with an efficient memory. In 2002, Charikar, Chen and Farach-Colton |14] showed an algorithm that 
can approximate the "top k" frequent-elements in an unbounded stream, where k is given as an 
input. Formally, their algorithm outputs only elements with frequency larger than (1 — e)0fc, where 
(j>k is the frequency of the kth. most frequent element in the stream, using memory proportional to 
L2/ {^4>k) 2 ■ Since the "heaviness" in this case is relative to and the memory is bounded by the 
fraction L|/ '(e<pk) 2 , Charikar et al.'s algorithm finds in fact heaviness in terms of the L2 norm. A 
natural question is whether one can develop an algorithm for finding frequent-elements that appear 
at least 7L2 times in the sliding window model, using poly{^~ 1 , log N, log u) memory. 

We give the first solution for the problem of finding an e-approximation of the Z^-frequent elements 
in the sliding window model. Our algorithm is able to identify elements that appear within the window 
a number of times which is at least a 7-fraction of the L2 norm of the window, up to a multiplicative 
factor of (1 — e). In addition, the algorithm guarantees to output all the elements with frequency at 
least (1 + 6)7-^2 • 

Our solution gives another step in the direction of making a connection between the unbounded 
and sliding window models, as it provides an answer for the very important question of heavy hitters 
in the sliding window model. The result joins the various solutions of finding Li-heavy hitters in 
sliding windows |26j EJ HUJ HI 03j ETJ [28] , and can be used in various algorithms that require identifying 
L2 heavy hitters, such as |31} [8] and others. We note that problem being considered here is the most 
powerful L p which can have a poly-logarithmic solution, specifically, the L2 norm |30| . More generally, 
our paper resolves the question of finding L p -heavy elements on sliding windows for all values of p 
that allows small memory one-pass solutions (i.e. all < p < 2). 

To achieve our result on L2 heavy hitters, we combine (in a non-trivial way) existing techniques. 
Variants of these techniques sufficient to derive similar results were known sincqj 2002. Surprisingly, 
no algorithm for L2 heavy hitters was reported despite several papers on L\ heavy hitters. Since 
L2 heavy hitters play a central role for many fundamental streaming problems (such as frequency 
moments and many others), we suspect that our L2 heavy hitters algorithm will significantly increase 
our understanding of sliding windows. 

A Broader Perspective. In fact, one can consider the tools we develop for the frequent elements 
problem as a general method that allows obtaining a sliding window solution out of an algorithm for 
the unbounded model, for a wide range of functions. We explain this concept in this section. 

Many statistical properties were aggregated into families, and efficient algorithms were designed 
for those families. For instance, Datar, Gionis, Indyk and Motwani, in their seminal paper [21] showed 
that a sliding window estimation is easy to achieve for any function which is weakly- additive by using 
a data structure named exponential histograms [2T]; for certain functions that decay with time, one 
can maintain time-decaying aggregates [16]; another data structure, named smooth-histogram [9] can 
be used in order to approximate an even larger set of functions, known as smooth functions. See pQ 
for a survey of synopsis construction. 

In this paper we introduce a new concept which uses a smooth-histogram in order to perform sliding 
window approximation of non-smooth properties. Informally speaking, the main idea is to relate the 

3 Indeed, we use the algorithm of Charikar et al. [14] that is known since 2002. Also, it is possible to replace (with 
some non-trivial effort) our smooth histogram method for L2 computation with the algorithm of Datar, Gionis, Indyk 
and Motwani [5T] for L2 approximation. 
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non-smooth property / with some other, smoothQ property g, such that changes in / are bounded 
by the changes in g. By maintaining a smooth-histogram for the smooth function g, we partition 
the stream into sets of sub-streams (buckets). Due to the properties of the smooth-histogram we can 
bound the error (of approximating g) for every sub-stream, and thus get an approximation of /. We 
use the term semi-smooth to describe these kinds of algorithms. 

We demonstrate the above idea by showing a concrete efficient sliding window algorithm for the 
properties of rarity and similarity [22], we stress that neither is smooth. In addition to those properties, 
we believe that the tools we develop here can be used to build efficient sliding window approximations 
for many other (non-smooth) properties and provides a general new method for computing on sliding 
windows. It is important to note that trying to build a smooth-histogram (or any other known sketch) 
directly to / will not preserve the required invariants, and the memory consumption might be not be 
efficient. 

1.1 Previous works 

Frequent elements problem. Finding elements that appear many times in the stream ("heavy 
hitters") is a very central question and thus has been extensively studied both for the unbounded 
model [El [37] and for the sliding window model [31 [40] [43] [27] as well as other variants such as the 
offline stream model [35], insertion and deletion model [201 [32], finding heavy-distinct-hitter [5], etc. 
Reducing the processing time was done by [M] into O(^) and by |28j into 0(1). 

Another problem which is related to finding the heavy hitters, is the top-A; problem, namely, finding 
the k most frequent elements. As mentioned above, Charikar, Chen and Farach-Colton |14| provide 
an algorithm that finds the k most frequent elements in the unbounded model (up to a precision of 
lie). Golab, DeHaan, Demaine, Lopez-Ortiz and Munro [26J solve this problem for the jumping 
window model. 

Similarity and a-rarity problem. The similarity problem was defined in order to give a rough 
estimation of closeness between files over the web [33] (and independently in [pS] ). Later, it was shown 
how to use min-hash functions [29] in order to sample from the stream, and estimate the similarity of 
two streams. 

The notion of a-rarity, introduced by Datar and Muthukrishnan [22] , is that of finding the fraction 
of elements that appear a times within the stream. This quantity can be seen as finding the fraction 
of elements with frequency within certain bounds. 

The questions of rarity and similarity were analyzed, both for the unbounded stream and the 
sliding window models, by Datar and Muthukrishnan [22], achieving an expected memory bound 
of 0(log N + log u) words of space for a constant e,a,5. At the bit level, their algorithm yields 
0(a • e -3 log 5 _1 log 7V(log N + log u)) bits for a-rarity and 0(e~ 3 log <5 _1 log iV(log N + log it)) bits for 
similarity, with 1 — 5 being the probability of succesqj. 

1.2 Our results 

We build the first efficient sliding window approximation for the L2-frequent-element problem, prove 
its correctness and bound its memory consumption by 0{^^ log N log ^ + ^ log A r log ^) words. 

Theorem 1. There exists a sliding window algorithm that obtains an e- approximation for the Li 
frequent- elements problem, in space po/y(e -1 , 7 -1 , log A^logc) -1 ). 

4 Of course, other kinds of aggregations can be used, however our focus is on smooth histograms. 
5 These bounds are not explicitly stated in the paper [22], but follow from the analysis (see Lemma 1 and Lemma 2 
in [22]). 
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Other results reported in this paper are efficient algorithms for the rarity and similarity problems 
(see Sections 14.21 and 14,31 for a detailed definition of the problems). Although for these problems there 
already exist algorithms with efficient yet expected memory bounds, our techniques achieve a worst 
case memory bounds of essentially the same magnitude (up to a factor of log log N). 

As mentioned above, the tools we develop here can be seen as a general method of building a 
sliding window approximation for a non-smooth property /, using a related smooth property g. For 
the problem of finding frequent elements, we use the L2 norm as the smooth property g, while for 
estimating rarity and similarity we set g to be the number of distinct elements in the stream. 

1.3 High-Level Ideas 

The main idea behind our approach is quite simple: partitioning the streams into a poly-logarithmic 
number of partitions (buckets) and estimating the frequencies in each. Although this approach is 
not new, most of the previous algorithms partition the stream in a correlative way to the estimated 
function (e.g., exponential and smooth histogram). There have been several works that partitioned 
the stream arbitrarily, e.g., by splitting the stream into Basic Windows [H] which provides a solution 
for a slightly different model |26j. 

The novel technique, and the main conceptual contribution of our paper is to partition the stream 
according to some other function g, rather than the function we wish to estimate. As long as g has 
structurally nice properties (e.g., is a smooth function, weakly additive, etc.) and there can be a special 
type correlation (as we illustrate in this paper on a number of examples) between g and the function 
we wish to estimate — the partition according to g then gives useful way of obtaining estimation of / 
with efficient memory. 

It is important to note that the method presented here is different from the concept of Purifier [10] , 
recently presented by Braverman and Ostrovsky, despite superficial similarities. More specifically, in 
|10j a description of Purifier is given: an informal method of reductions that is applied to solve a 
wide class of functions. Both methods use an "easier" function g as a building block for computing a 
"difficult" function /. However, the ways of applying the reductions are completely different. To begin 
with, the purifier is applied on unbounded streams while our method is applied on sliding windows. 
Most importantly, in [TO] the "easier" functions g is used as a substitute for / under certain conditions. 
In this paper we use "easier" functions to "format" the stream into right buckets. We explicitly state 
that g cannot be used as an approximation of / since g is smooth and / is not. Applying g as a right 
formatting tool is exactly the novelty of our method. 

Our scheme begins with building a smooth-histogram [9] for approximating the L2 norm of the 
frequency vector of the window. By doing so, we partition the stream into 0(-j log N) buckets, so that 
the current window is always a suffix of the first bucket A\ , and their frequency L2 norm is "close" . 
Moreover, the smooth-histogram provides us with an e-estimation of the frequency L2 norm of the 
window, Z/2- This value is also a good estimation of the frequency L2 norm of the bucket A\ (with a 
different parameter e'), since the window and the bucket have a close frequency L2 norm. 

Next we show that every element that is "heavy" in the window, is also heavy in the bucket A±. 
This allows us to use the method of Charikar et al. in order to obtain a list of the most frequent elements 
in the bucket, along with their approximated frequencies, hi. Setting the memory to 0(e~ 2 7~ 2 logiV), 
Charikar et al.'s algorithm is guaranteed to output any element with frequency higher than a specified 
threshold, specifically (1 + e)jL2- However, it might output other elements which are less frequent. 

Finally, we use the approximated frequencies hi of the elements in the bucket A±, as outputted 
by Charikar et al.'s algorithm, along with the L2 approximation of the window, L2, as given by the 
smooth-histogram, in order to output only those elements which are heavy enough in the window. 
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This method is used again for estimating the a-Rarity and the Similarity of two streams. Instead of 
partitioning the stream according to the estimated value of the rarity (or similarity), we partition the 
stream according to the number of distinct elements. Namely, we build a smooth-histogram so that 
the bucket A± and the window would have approximately the same number of distinct elements. We 
then use approximated min-wise hash functions [29] to sample elements from the bucket. Using this 
sampling, following the techniques of Broder [11] and of Datar and Muthukrishnan [22] for unbounded 
streams, we are able to obtain the specific estimation (i.e., of similarity or a-rarity) on the bucket. 
Finally, we show that for this specific partition, an estimation on the bucket is a good estimation on 
the current window. 

1.4 Organization of the paper 

Section [2] describes our notations and several of the tools we use. In Section [3] we define and discuss 
the frequent-elements problem, and provide an algorithm which solves the approximation version 
of the frequent-elements problem in the sliding window model. Section 2] describes and analyzes a 
semi-smooth algorithm for the a-rarity problem and a semi-smooth algorithm for approximating the 
similarity of two streams, in the sliding window model. Conclusions are given in Section [5j 

2 Preliminaries 

2.1 Notations 

We say that an algorithm A is an (e, (^-approximation of a function /, if (1 — e)f < A < (1 + e)/, 
except for probability 5 over ^4's coin tosses. We denote this relation as A £ (1 ± e)f for short. We 
denote an output of an approximation algorithm with a hat symbol, e.g., the estimator of / is denoted 

/■ 

The set {1, 2, . . . , n} is usually denoted as [n]. If the stream B is a suffix of A, we denote B C r A. 
For instance, let A = {qi,q2, ■ ■ ■ , q n } then B = {q ni , q ni +i, • • • , q n } A, if 1 < n± < n. The notation 
AL)C denotes the concatenation of the stream C = {ci, C2, . . . , c m } to the end of stream A, i.e., 
A U C = { qi , (/2j ■ ■ ■ ■> q-rn c i, C2, . . . c m }. The notation \A\ denotes the number of different elements in 
the stream A, that is the cardinality of the set imposed by the multiset A. The size of the stream (i.e. 
of the multiset) A will be denoted as e.g., for the example above = n. 

We use the notation O(-) to indicate an asymptotic bound which suppresses any term of magnitude 
poly (log \ , log log \ , log log N, log log u) . 

2.2 Smooth Histograms 

Recently, Braverman and Ostrovsky [9] showed that any function / that can be calculated (or approxi- 
mated) in the unbounded stream model, and belongs to a family of functions named smooth-functions, 
can also be e-approximated in the sliding window model. Formally, 

Definition 2 (Smooth-Functions [S]). A polynomial function f is (a, (3) -smooth if it satisfies the 
following properties: (i) f(A) > 0; (ii) f(A) > f(B) for B C r A; and (Hi) there exist < /3 < a < 1 
such that if (1 - j3)f(A) < f (B) for B C r A, then (1 - a)f(A U C) < f(B U C) for any adjacent C. 

If an (a, /3)-smooth / can be calculated (or (e, J)-approximated) on an unbounded stream with 
memory g(e,5), then there exists an (a + e, 5)-estimation of / in the sliding window model using 
Oa\ogN(g(e, A) + log JV)) bits ®. 
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The key idea is to save a "smooth-histogram" (SH) [9], a structure that contains estimations 
on 0(4 log iV)-sumxes of the stream, Ay |3 r A2 5r ••• 5r A^iog(n)- Each suffix A, is called a 
Bucket. Each new element in the stream initiates a new bucket, however adjacent buckets with a 
close estimation of / are removed (keeping only one representative). Since the function is "smooth", 
i.e., monotonic and slowly-changing, it is enough to save O(jjlogiV) buckets in order to maintain a 
reasonable approximation of the window. At any given time, the current window W is between buckets 
Ai and A2, i.e. A\ j^r W 5r A<i- Once the window "slides" and the first element of A2 expires, we 
delete the bucket A\ and renumber the indices so that A2 becomes the new Ay, A3 becomes the new 
A2, etc. We will use the value of the smooth- function on bucket Ay to estimate the value of the 
function on the current window. The relation between the window and the bucket as derived in [9] is 
given by 

(l-a)f(A 1 )<f(A 2 )<f(W)<f(A 1 ) . 

3 A Semi- Smooth Estimation of Frequent Elements 

In this section we develop an efficient semi-smooth algorithm for finding the elements which occur 
frequently within the window. Let rij be the frequency of an element i £ {1, . . . , u}, i.e., the number of 
times i appear in the window. The first frequency norm and the second frequency norm of the window 

are defined by L\ = X^=i n * = N and L2 = (X^=i n ?) 5 - Our n °ti° n of approximating frequent 
elements is given in Definition [U An equivalent definition to Definition [TJ which we use in our proof 
is the following: 

Definition 3. For < e, 7 < 1, output any element i € [u] with frequency higher than (1 + ce)^L2, 
and do not output any element with frequency lower than (1 — ce)^L2, for a constant c. 

Many papers [19j [3j [39j H31 EZ] make use of the following (weaker) definition 

Definition 4 ((7, e)-approximation of Li-heavy hitters). Output any element i 6 [u] such that rij > 
7L1 and no element such that m < (1 — e)jLi. 

An L2 approximation is stronger than the above L\ definition [30], since when a certain element 
is heavy in the means of the L\ norm, it is also heavy in the means of the L2 norm, 

Ui > 7X1 = 7 n i n i ^ 7 2 ( Y - ^ n i = (^ L 2) 2 , 

j " j j 

while the opposite direction is not true for the general case. 

In order to identify the frequent elements in the current window, we use an algorithm by Charikar 
et al. p3], which provides an e-approximation (for the unbounded stream model) of the following 
problem 

Definition 5 ((fc,e)-top frequent approximation [H]). Output a list of k elements such that every 
outputted element i has a frequency rii > (1 — e)4>k, where cpk is the frequency of the kth most frequent 
element in the stream. 

The algorithm of Charkiar et al. guarantees that any element that satisfies n« > (1 + e)(f>k, appears 
in the output. This algorithm runs on a stream of size n and succeeds with probability at least 1 — 5 
with memory complexity of 
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for every 5 > 0, given that <f>k > 7L2. 

Definition [5] and Definition [T] do not describe the same problem, yet they are strongly connected. 
In fact, our method allows solving the frequent elements problem under both definitions, however in 
this paper we focus on solving the L2-frequent-elements problem, as defined by Definition [3l In order 
to do so, we use the algorithm CountSketch;, given by Charikar et al., with specific parameters 
tailored for our problem. The algorithm outputs a list of elements, and is guaranteed to output every 
element with frequency at least (1 + e')'yL2 and no element of frequency less than (1 — e') , yL2, for 
an input parameter e'. See Appendix [A] for a full definition of CoUNTSKETCHfe and a proof of its 
properties. 

Note that a solution of the A;-top frequent problem (Definition [5]) in the sliding window model can 
be done by using the same technique using the original algorithm of Charikar et al. 

3.1 Semi-smooth algorithm for frequent elements approximation 

Our algorithm builds a smooth-histogram for the L2 norm, and partitions the stream into buckets 
accordingly. It is known that the L2 property is a (e, Y)- smo °th function [9]. Using the method of 
Charikar et al. |14| . separately on each bucket, with a careful choice of parameters, we are able to 
approximate the (7, e)-frequent elements problem on a sliding window (Figured]). 



ApproxFreqElements(7, e, 5) 

1. Maintain an (|, ^-estimation of the L2 norm of the window, using a smooth- histogram. 

2. For each bucket of the smooth-histogram, Ai, A2, ■ ■ ■ maintain an approximated list of the 
k = -K + 1 most frequent elements, by running (7, |, |) — CountSketchj,. 

(see CountSketch^'s description in Appendix |A|) . 

3. Let L2 be the approximated value of the L2 norm of the current window W, as given by 
the the smooth- histogram. Let qi,...,qk E {l,...,u} be the list of the k most heavy 
elements in A\, along with hi, . . . ,hk the estimation of their frequencies, as outputted by 
CountSketch&. 

4. Output any element % that satisfies > j^jL/2- 



Theorem 2. The semi-smooth algorithm ApproxFreqElements (Fig. QP is a (7, e)- approximation 
of the L2-frequent elements problem, with success probability at least 1 — 5. 

I.e., there exists a constant c such that for a small enough e, the algorithm returns a list which 
includes all the elements with frequency at least (1 + ce) r yL2(W) and no element with frequency lower 
than (l-ce)jL 2 (W). 

See proof in Appendix [Bl 

Memory usage. The memory usage of the protocol is composed of two parts: maintaining a 
(e/2, <5/2)-smooth-histogram of L2, and running CountSketch;, on each of the buckets. Accord- 
ing to [9] (corollary 5), maintaining a smooth-histogram for L2 can be done with memory 



for a relative error of e/2 + e 2 /8, with success probability at least 1 — 5/2. For a small enough e, 
e/2 + e 2 /8 < e, as required. 



Figure 1: A semi-smooth algorithm for the frequent elements problem 
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As for the second part, recall that one instance of CountSketcH(, requires a memory of 0(^p log 
(see Appendix |A"|). where n is the size of the input. In our case the maximal size of the input is the 
size of the first bucket, Note that log = O(logiV) since (1 - a)L 2 {A{) < L 2 (W) < N. The 

number of CountSketcHj, instances is bounded by the number of buckets, O(pTogiV) [9], which 
leads to a total memory bound of 

O (^logiVlog^ + ^logTVlog- 
\7 e o e 4 e 

Extensions to different L p . It is easy to see that the same method can be used in order to 
approximate L p -heavy elements for any < p < 2, up to a 1 ± e precision. The algorithms and 
analysis remains the same, except for using a smooth-histogram for the L p norm, and changing the 
parameters by constants. 

Theorem 3. There exists a sliding window algorithm that outputs all elements with frequency at least 
(l + e)jL p , and no element with frequency less then (1 — e)^L p . The algorithm succeeds with probability 
at least 1 — 5 and takes memory of poly{e~ l , 7" 1 , log N, log <5 _1 ). 

4 Estimation of Non-Smooth Properties Relativized to the Number 
of Distinct Elements 

In this section we extend the method shown above and apply it on other non-smooth functions. While 
above we relate the frequent-elements problem to the smooth L 2 problem, in this section we use a 
different smooth function to partition the stream, namely the distinct elements count problem. This 
allows us to provide efficient semi-smooth approximations for the Similarity and a- Rarity (non-smooth) 
properties. 




4.1 Preliminaries 

We now show that counting the number of distinct elements in a stream is smooth. This allows 
us to partition the stream into a smooth-histogram structure, where any two adjacent buckets have 
approximately the same number of distinct elements. 

Proposition 4. Define DEC(^4) as the number of distinct elements in the stream A, i.e., DEC(^4) = 
\A\. The function DEC is an (e, e)- smooth- function, for every < e < 1. 

Proof. Properties (i) and (ii) of Definition [2] follow directly from DEC's definition. As for property 
(iii), assume that B C r A and (1 - e) DEC (4) < DEC (B), then 

(1 - e)DEC(AuC) = (1 — e) [DEC(A) + DEC(C \ A)] 

< DEC( J B) + (l-e)DEC(C\A) 

< DEC(-B) + DEC(C \ B) 
= BEC(BUC), 

where il A \ B" represents the set consisting of the elements in A but not in B. □ 

There have been many works on counting distinct elements in streams, initiated by Flajolet and 
Martin [21], and later improved by [21 E3 El E] . Very recently Kane, Nelson and Woodruff provided 
an optimal algorithm for (e, ^-approximating the number of distinct elements [33], with memory 
0({jz + log u) log \) bits and time O(l). We use the method of Kane et al. in order to build a 
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smooth- histogram for the distinct elements count with memory O ( (log u + \ ) ^ log N log ^ + ^ log 2 iV") , 
suppressing log log N and log ^ terms. 

Another tool we use is min-wise hash functions |11]I12|. used in various algorithms in order to esti- 
mate different characteristics of data streams, especially the similarity of two streams [H]. Informally 
speaking, these functions have a meaning of sampling an element uniformly from the stream, and thus 
they are a very useful tool. 

Definition 6 (min-hash [El)- Let II = {irA- be a family of permutations over [u] = {1, . . . ,u}. For a 
subset A C [u] define hi to be the minimal permuted value of 7Tj over A, 

hi = min7Tj(a). 

A family {hi} of such functions is called exact min-wise independent hash functions (or min-hash) if 
for any subset A C [u] and a £ A 

Pr(h i (A)=ir i (a)) = ±- 

The family {h{\ is called e- approximated min-wise independent hash functions (or e-min-hash) if for 
any subset A C [u] and a £ A, 

Pr(hi(A) = m(a)) G -L(l±e). 
i \A\ 

A specific construction of e-min-hash functions was presented by Indyk |29| . using only 0(log - log u) 
bits. The time per hash calculation is bounded by 0(log j). 

Min-hash functions can be used in order to estimate the similarity of two sets, by using the following 
lemma, 

Lemma 5. (|12j. See also [22].) For any two sets A and W and an e' -min-hash function hi, 
4.2 A Semi-Smooth Estimation for the a-Rarity Problem 

In the following section we present an algorithm that estimates the a-rarity of a stream (in the sliding 
window model), i.e., the ratio of elements that appear exactly a times in the window. The rarity 
property is known not to be smooth, yet by using a smooth-histogram for the distinct elements count, 
we are able to partition the stream into log N) buckets, and estimate the a-rarity in each bucket. 

Definition 7 (a-rarity, [22J). An element x is a-rare if it appears exactly a times in the stream. The 
a-rarity measure, p a , denotes the ratio of a-rare elements in the entire stream S, i.e., 

| {x | x is a-rare in S}\ 
Pa = DEC(S) ' 



Our algorithm follows the method used in [22] to estimate a-rarity in the unbounded model. The 
estimation is based on the fact that the a-rarity is equal to the portion of min-hash functions that 
their min-value appears exactly a times in the stream. 

However, in order to estimate rarity over sliding windows, one needs to estimate the ratio of min- 
hash functions of which the min-value appears exactly a times within the window. Our algorithm 
builds a smooth- histogram for DEC in order to split the stream into buckets, such that each two 
consecutive buckets would have approximately the same number of distinct elements. In addition, 
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we sample the bucket using a min-wise hash, and count the a + 1 last occurrences of the sampled 
element Xi in the bucket. We estimate the a-rarity of the window by calculating the fraction of the 
min-hash functions of which the appropriate min-value Xi appears exactly a times within the window. 
Due to feasibility reasons we use approximated min-wise hashes, and prove that this estimation is an 
e-approximation of the a-rarity of the current window (up to a pre-specified additive precision). The 
semi-smooth algorithm ApproxRarity for a-rarity is defined in Figure El 



ApproxRarity(e, 6) 

1. Randomly choose k |-min-hash functions hi, hi, ■ ■ ■, h}.. 

2. Maintain an (e, |)-estimation of the number of distinct elements by building a smooth his- 
togram. 

3. For every bucket instance Aj of the smooth- histogram and for each one of the hash functions 
hi, i £ [k] 

(a) Maintain the value of the min-hash function hi over the bucket, hi(Aj). 

(b) Maintain a list Lj (Aj ) of the most recent a + 1 occurrences of hi (Aj ) in Aj . 

(c) Whenever the value hi(Aj) changes, re-initialize the list Li(Aj), and continue maintain- 
ing the occurrences of the new value hi(Aj). 

4. Output p a , the ratio of the min-hash functions hi, which has exactly a active elements in 
Li(A\), i.e. the ratio 

p a = \{i s.t. Li(Ai) consists exactly a active elements} \/k . 



Figure 2: Semi-smooth algorithm for a-rarity 

The ApproxRarity algorithm provides an (e, ^-approximation for the a-rarity problem, up to an 
additive error of e. As proven by Datar et al. [22], the ratio of min-hash functions that have exactly 
a active elements in the window is an estimation of p a - This is true even if use the min-value of the 
inclusive bucket A\ rather than the min-value of the current windows W . 

Theorem 6. The semi-smooth algorithm (Fig. [2]) is an (e, 5) -approximation for the a-rarity problem, 
up to an additive precision. 

See proof in Appendix [Cj 

Memory Usage. The memory consumption of the ApproxRarity algorithm is as follows. Main- 
taining a smooth histogram for DEC is done using the method of Kane et al. [33] as the underlying 
algorithm for DEC in the unbounded model, with memory 0((\ogu + -^s)h log A log 4 + \ log 2 A); k 
seeds for the |-min-hash functions: 0(k\og - logu); Saving a list Li and a value hi for each bucket Aj 
and for i G [A;]: OQlogtt + a log ^ log AO. 

We note that this improves the expected memory bound of Datar et al. [22] into a worst case 
bound of the same magnitude (up to a log log A term) . In most of the practical cases log u and 
log A are very close, and we can assume that logu = 0(log A). In that case, the space complexity 
is O Qalog 2 A) bits, with k = £l(j? log ^), and the time complexity is O (^p log A) calculations per 
element, suppressing poly(log -, log log A) terms. 
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4.3 A Semi-Smooth Estimation of Streams Similarity 



In this section we present an algorithm for calculating the similarity of two streams X and Y. As 
in the case of the rarity, the similarity property is known not to be smooth, however we are able to 
design a semi-smooth algorithm that estimates it. We maintain a smooth-histogram of the distinct 
elements count in order to partition each of the streams, and sample each bucket of this partition using 
a min-hash function. We compare the ratio of sample agreements in order to estimate the similarity 
of the two streams. 

Definition 8 (similarity). The (Jaccard) similarity of two streams, X and Y is given by 

S(X,Y) IXnM 



\XUY\ 



Recall that for two streams X and Y, a reasonable estimation of S(X, Y) is given by the number 
of min-hash values they agree on [22]. In other words, let h\, h%, . . . , hk be a family of e-min hash 
functions and let 

S(X, Y) = \{i € [k] s.t. hi(X) = hi{Y)}\ jk , 

then S(X,Y) G (1 ± e)S(X,Y) + e(l + p), with success probability at least 1 — 5, where p and 
S are determined by k. Based on this fact, Datar et al. [22] showed an algorithm for estimating 
similarity in the sliding window model, that uses expected memory of 0(/c(log ^ + log./V")) words with 
k = log |). Using smooth- histograms, our algorithm reduces the expected memory bound into 

a worst-case bound. The semi-smooth algorithm ApproxSimilarity is rather straightforward and is 
given in Figure El 



ApproxSimilarity(e, 5) 

1. Randomly choose k e'-min-hash functions, hi, . . . , hk- The constant e' will be specified later, 
as a function of the desired precision e. 

2. For each stream (X and Y) maintain an (e', |)-estimation of the number of distinct elements 
by building a smooth histogram. 

3. For each stream and for each bucket instance A±, A2, ■ ■ ■ , separately calculate the values of 
each of the min-hash functions hi, i = 1 . . . k. 

4. Let Ax (Ay) be the first smooth-histogram bucket that includes the current window W x 
{Wy) of the stream X (Y). Output the ratio of hash- functions hi which agree on the minimal 
value, i.e., 

HWx,W Y ) = \{i G [k] s.t. hi(A x ) = hi(A Y )}\ jk . 



Figure 3: A semi-smooth algorithm for estimating similarity 



Theorem 7. The semi-smooth algorithm for estimating similarity (Fig. [3p, is an (e, 5) -approximation 
for the similarity problem, up to an additive precision. 

See proof in Appendix [Dj 
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Memory Usage. Let us summarize the memory consumption of the ApproxSimilarity algorithm. 
Maintaining a smooth histogram for DEC: 6 ((log it + ^)\ log iVlog ± + \ log 2 N); k seeds for e/2- 
min-hash functions: 0(k log - log u); Keeping the hash value for each hf. 0(kj log Nlogu). 

Our algorithm improves the currently known expected bound [22] into a worst case bound of the 
same magnitude (up to a log log N term). Taking k = Q(-^log^) and assuming logu = O(logiV), 
we achieve a memory bound of 0(kj log 2 N) , with 0(kj log N) calculations per element, suppressing 
poly (log ~, log log N) elements. 

5 Conclusions 

To conclude, we have shown the first poly-logarithmic algorithm for identifying Li heavy-hitters up 
to a 1 ± e precision, over sliding windows. Our result supplies another insight about the relations of 
the unbounded and sliding window model, for the central question of heavy-hitters. As the L p -heavy- 
hitters problem is more difficult for larger p, and for p > 2 there can not exist a poly-logarithmic 
solution, our algorithm provides a solution to the strongest L p norm with small memory. 

Although our main concern was the Li norm, the algorithm can easily be extended for any L p 
with < p < 2. Moreover, a poly-logarithmic approximation of the top-k problem in sliding window 
is immediate using our methods. 

We have demonstrated that the tools shown in this paper can be applied to many other properties, 
if there exists a smooth function which is correlated to our target function. We have shown how 
to employ the same techniques in order to obtain a sliding window algorithm for the similarity and 
a-rarity problems, which yields essentially the same memory bound as the current state of the art, 
yet the bound is for the worst case scenario. We expect that our method can improve the memory 
efficiency of many sliding-window algorithms when applied to other non-smooth properties. 
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Appendix 

A The CountSketch^ algorithm 

In this section we describe the CountSketch;, algorithm and prove several of its properties. Define 
(7, e', 5)— CouNTSKETCHb for < e', 7, 5 < 1 as the algorithm CountSketch defined in [14] . setting 
k = ^ + 1 and limiting the memory usage by using the parameter b = -4rp7- The choice of k follows 
from the following known fact 

Lemma 8. There are at most elements with frequency higher than 7L2. 

Proof. Assume that there are m elements with frequency higher than 7L2. It follows that L 2 = 
(£j=l n )) 1 ' 2 >V^'lL 2 . Clearly, m < □ 

Setting k = ^ + 1 ensures that the outputted list is large enough to contain all the elements with 
frequency 7L2 or more. 

However, CountSketch;, does not guarantee anymore to output all the elements with frequency 
higher than (1 + e')(pk and no element of frequency less than (1 — e')(f)k (Lemma 5 of |14j). since the 
value of b might not satisfy the conditions of that lemma. 

We can still follow the analysis of [14] and claim that the frequency approximation of each element 
is still bounded (Lemma 4 of |14j). 

Lemma 9. with probability 1 — 5, for all elements i G [u] in the stream S, 

\hi - m\ < 8^2 < ^e'L 2 (S) 
Vb 

where hi is the approximated frequency of i calculated by CountSketch^, and is the real frequency 
of the element i. 

The above lemma allows us to bound the frequencies of the outputted elements 
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Proposition 10. The (7, e' , 5)— CountSketch^ algorithm does not output any element with frequency 
less than (1 — e , ) r yLi2(S) 



Proof. There exist k elements which satisfy n\ > (pk- The estimated frequency of those elements 
satisfies hi > — ^€'7X2 (5) > n& — he'~/L2 (s). Every element with frequency less than 0& — e'^L2(s) 
would have a lower frequency estimation than the above k elements, and thus could not be in the 
outputted list. For k = \ + 1, Lemma [8] suggests that < / yL 2 (S) and thus, <pk — e'^L 2 {S) < 
(l-e>hL 2 (S). □ 

Proposition 11. The (7, e', <5)— CountSketchj, algorithm outputs all elements with frequency at least 
(l + e'hL 2 (S) 

Proof. An element is not in the outputted list only if there are (at least) k elements with higher 
approximated frequency. Due to Lemma [9l any element i with frequency rij > (1 + e')^L 2 {S) has an 
estimated frequency of at least h% > (1 + ^)jL2(S), so it can be replaced only by an element with 
frequency higher than 7X2(5), however, there are at most k elements with m > 7X2(6"), specifically, 
at most k — 1 elements other than i itself, which completes the proof. □ 

The memory consumption of CountSketch^ is bounded by 0({k + b) log ^)) [13], which in our 
case gives 0(^^ log 

B Proof of Theorem [2] 

Theorem [2] The semi-smooth algorithm ApproxFreqElements (Fig. [/J) is a (7, e)- approximation 
of the L2-frequent elements problem, with success probability at least 1 — 5. 

I.e., there exists a constant c such that for a small enough e, the algorithm returns a list which 
includes all the elements with frequency at least (1 + ce)^L2 (W) and no element with frequency lower 
than (1 - ce)"fL 2 {W). 

Proof. Recall that the smooth-histograms data structure for L 2 guarantees us an estimation L2 which 
is (1 ± e)L2(W); in addition there exists some a such that (1 — a)L 2 (Ai) < L 2 (W) < L 2 (Ai). In our 
case the inequality is satisfied for a = e/2 (see Theorem 3 and Definition 3 in [9]). Any element j with 
frequency nj(W) > (1 + e)jL 2 (W) satisfies 

nj{Ai) > n ii w ) > (1 + thL 2 (W) > (1 + e)(l - e/2) 7 L 2 (Ai) , 

and will be outputted in StepEJ since Proposition 1111 guarantees that any element i such that ni(Ai) > 
(1 + e/4)7L 2 (Ai) is outputted by CountSketch;, (assuming e < \). 

In order to show that all of the required elements survive Step HI we use Lemma [9] to bound the 
estimated frequency hi reported by CountSketcH(„ and show it is above the required threshold. If 
m(W) > (1 + e)jL 2 {W) then 



hi(A) > m(A) - € -- 1 L 2 {A) > m(W) - -^L 2 (W) > 



1 + e 



jL 2 (W) 



recalling that L2 < (1 + e)L 2 (W) implies that the element survives SteplH 

While we are guaranteed that all the (1 + e)^L 2 (VF)-frequent elements appear in the outputted 
list, it might contain as well many other elements, which are not heavy enough. We now prove that 
Step [H eliminates any element of frequency less than (1 — ce) / yL 2 (W), for a constant c. 
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Lemma 12. If for an element i there exists some £ > yfe such that rii(A\) > C,L 2 {A\), then there 
exist a constant £ > such that m(W) > ^L 2 {W). 



Proof. By the properties of the smooth-histogram, 

L 2 (W) 2 > (1 - e/2) 2 L 2 {A{f > (1 - e)L 2 (Ai) 
, ( W )2 + nj (Wf > ni(Ai) 2 + 2 n,^) - eL 2 (^i) 2 



n,(iy) 2 > ^(Ai) 2 - eL 2 (A l ) 2 > (C 2 - e)L 2 (Ai) 2 



and ni {W) > £L 2 {W) for £ < ^(£ 2 - e) □ 
Assume that an element i survived SteplU thus hi{A\) > jt^jL 2 > ^^jL 2 (W). By Lemma[9l 



e T fA ^ f(l -e)(l- 
1 + e 



ni(Ai) > ni(Ai) - -7^2(Ai) > ^ - - 7^2(^1) > U " 3e) 7 L 2 (A 1 ), 



and by Lemma El rn{W) > VI - 7e • 7X2 (W). This proves that for small enough e there exist 
some constant c such that the algorithm doesn't output any element with frequency lower than (1 — 
ce)7L 2 (W). 

To conclude, except for probability 5/2 we are able to partition the stream into L 2 -smooth buckets, 
and except for probability 5/2, the CountSketcH(, algorithm outputs a list which can be used to 
identify the frequent elements of the window. Using the union bound we conclude that the entire 
algorithm succeeds except for probability 5. This completes the proof of the theorem. □ 



C Proof of Theorem [6] 

Theorem [6] The semi-smooth algorithm (Fig.\^j is an (e, S) -approximation for the a-rarity problem, 
up to an additive precision. 

Proof. For the sake of simplicity we treat the multisets A±, W, etc., as sets. Let R a be the set of 
elements which are a-rare in the window W . Following Lemma with R a C A\, 

NW i«_ M A0]-j£^ ± l-l^l4 

The algorithm outputs an approximation of Pr[Lj(Ai) consists of exactly a active elements], which 
equals to Pr[/ij(Ai) = hi(R a )\, since hi{A\) = hi{R a ) if and only if Li(A\) consists of a active elements. 
Let Xi be the element which minimizes hi on A\, h(xj) = h{A\). If the number of active elements in 
Li(A\) is not a, then R a , thus h(A\) / h(R a ). For the other direction, if hi(Ai) = hi(R a ) then 
Li(Ai) counts the number of occurrences of Xi in the bucket, and since Xi £ R a , it appears exactly a 
times within the window. 

We build a smooth-histogram for DEC by using the algorithm of Kane et al. [33] as an approxi- 
mation of DEC for the unbounded model (see Theorem 3 in [9]). The smooth- histogram guarantees^] 
that (1 - e)\Ai\ < \W\ < \Ai\, thus 

J <T^=P« . ^>0-e)^>(l-e)p a . 



\Ai\ ~ \W\ ^ ' |Ai| " v ' \W\ 



Actually, it guarantees even a better bound, specifically, (1 — f )|Ai| < |W| < 
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Therefore, estimating the ratio p a using k hash functions results with a value (1 ± e)p a ± % up to 
some additive error e determined by k. Finally, using Chernoff 's inequality we can bound the additive 
error so that e < §, except for probability |. In order to achieve the desired precision we require 
k = f2(Jj log and the estimation satisfies 

p a G (1 ± e)p a ± e , 

except for probability at most 5. This concludes the correctness of the algorithm. □ 



D Proof of Theorem [7] 

Theorem [7] The semi-smooth algorithm for estimating similarity (Fig. is an (e, 5) -approximation 
for the similarity problem, up to an additive precision. 



Proof. Following Lemma 



For convenience, we once again treat the buckets Ax, Ay,Wx,Wy as sets. Notice that we can 
write Ax = WxU(Ax\Wx) and that < |Ax\Wx| < jz^/|Wx|) which follows from the guarantee of 
the smooth-histogram that (1 — e')|Ax| < \Wx\ < \Ax\ (and same for Ay and Wy). Using elementary 
set operations, we can estimate \Wx U Wy \ using \ Ax U Ay\, 



< |VFx U Wy\ + 2- -\W X U Wy\ 

= ]^\WxUWy\ . 



In addition, any two sets S, Q satisfy \ Su q\ = igugr ~~ 1; thus the similarity estimation satisfies 



\snQ\ _ \s\+\Q\ 

[Ax H Ay| I A X I + I Ay I 

|A X UAy| |AxUAy| 



< 



T=?\Wx\ + jk?\WY\ 1 |IFxnWy| e' 



|IUxUWy| l-e'\W x UW Y \ 1-e 



+ T— 7) and 



|AxnAy| \w x nw Y \ i-e'\w x nw Y \ 



A X UAy| - ^\WxUWy\ l + e'\WxUW Y \ 



Finally, setting e' < e/2 gives an estimation a(Wx,Wy) e (1 ± e)S(Wx,Wy) ± 3e/2, up to an 
additional additive error, which can be arbitrarily decreased using Chernoff 's bound, by increasing k. 
Specifically, this additional error is bounded by 0(e) when k = S7(Jjlog|), with success probability 
at least 1-0(S). □ 
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